Entry Name: "UBA-Cesario_Picoaga-MC3"
VAST 2013 Challenge
Mini-Challenge 3: Visual Analytics for Network Situation Awareness
Team Members:
Diego Martin Cesario,
University of Buenos Aires (UBA), diegomcesario@gmail.com PRIMARY
Jorge Kuday Picoaga, University of Buenos Aires (UBA), georgepicoaga@gmail.com (Point
of contact for questions/answers)
Student Team: YES
Analytic Tools Used:
Tableau
Excel
2010
Oracle
data mining
R
(packages as tau, ggplot, lattice, gridExtra, lubridate and
programing)
Gephi for R
SQL
Server 2008
May we post your submission in the Visual
Analytics Benchmark Repository after VAST Challenge 2013 is complete? Yes
Video:
VAST_2013_MC3_Cesario_Picoaga.wmv
Questions
MC3.1 – Provide a timeline
(i.e., events organized in chronological order) of the notable events that
occur in Big Marketing’s computer networks for the two weeks of supplied data.
Use all data at your disposal to identify up to twelve events and describe them
to the extent possible. Your answer
should be no more than 1000 words long and may contain up to twelve images.
The purpose for these
observations was to gain an in-depth understanding of the Big Marketing network
pattern and suspect activity along our Network.
1.
For a
general view we decide to reproduce the complete network activity along the 2
weeks with some heatmaps and our first event is shown,
no log activity between the 8th and the 9th of April, at first we
thought about general outage servers during 2 days, but finally we assumed that the lack
of activity was due to the new ipslogs system
implementation on the second week. Both daily and minutes time Per Gygas plot heatmaps show us the
days with higher network activity. These days were the April 3rd and 14th.
Event 1: Unusual network
activity on 2 days, we were able to determine what periods are the busiest.
2.
In a
new attempt to plot the network activity per hour and per day, we assume the
principal network activity will be shown in business hours, in a new bar plot
we detect an unusual high network activity in the night in some days.
Event2: Unusual network activity off business hours.
3.
We
decide to use the bbcontent field section of the BBrother table as a metric (uptime for server by
site). This metric provides
the time that the server was up since its last status
reported. This metric was used to obtain information about
the servers that had suspect down along the 2 weeks to get more server pattern behavior.
Event3: Some servers had
more unusual reboots than others
4.
We
decide to visualize another metric that we consider interesting to get. The
disk use percentage by hostname along 2 weeks.
Event4: We show 3 servers with the
message fatal error warning that means disk availability full.
-
Administrador server
-
Web03.bigmkt3.com
-
Web01.bigmkt1.com
5. Due to we need to have evidence
about the external attacks to the servers, We focus on
messages like “disk use and down status”, thus we get this target using text
mining technics. We parse the Bbcontent field of the Bbrother table implementing a simple words frequency to get
a outliers records. We built a bbcontent
tokenizer with words meaning dictionary and we create
a frequency inverted plot and a new event was found.
Event5: Suspect possible virus messages inside the bbcontent data.
Word |
Frecuency |
Means |
unreachable |
1432484 |
Server cannot stablish
connection, intuitive alert token |
Loss |
1015549 |
perdida
de algun tipo , token de
alerta intuitiva |
Can |
477259 |
se
refiere a can`t no se puede realizar algun proceso, intuitive alert token |
Wmiprvse |
50338 |
Sasser trojan uses this name and similar variants to pass
undetected wmiprvsw.exe. |
Winlogon |
23505 |
Winlogon.exe means process used by the virus developers
to hide of the network administrators. |
Wininit |
17788 |
Ussualy refers to
virus attack in the beginning of the server activity (in windows
environments) |
vmupgradehelper |
3087 |
Using VMUpgradeHelper.exe /r from a Windows
command line (to restore NIC settings) fails with the error:Restore network config
failed. |
Vmwaretray |
2775 |
vmware-tray.exe file can be
a program very dangerous and ussually involves a vmware instalation. |
Panic |
1481 |
Fails detected, intuitive alert token |
Failed |
637 |
Fails detected, intuitive alert token |
Rebooted |
238 |
To turn (a computer or operating system) off and
then on again; restart, intuitive alert token |
Slui |
67 |
slui.exe refers to
application which is installed when Windows is not genuine. |
msexchangemailboxassistants |
13 |
msexchangemailboxassistants.exe high cpu and network for specific user |
Msftesql |
2 |
The free file information forum can help you
determine if msftesql.exe is a virus, trojan,
spyware, or adware that you can remove, or a file belonging to a Windows
system or an application you can trust. |
Wlrmdr |
2 |
A wlrmdr.exe file should only be located in the
Windows\system32 folder of your pc as this is the default path where this
file is designed to execute from. Unfortunately, some of the undetected
spyware may be responsible for the wlrmdr.exe errors occurring on your
computer by placing a file with the similar name. Moreover, an uninstall of a program that has been performed
incorrectly or incompletely may also lead to the wlrmdr.exe errors. |
antispamupdatesvc |
1 |
"microsoft.exchange.antispamupdatesvc.exe"
process can be represent a threat or a virus |
Conhost |
1 |
C:\WINDOWS\SYSTEM32\CONHOST.EXE always attempts
to access my computer. So, it appears that malware has somehow externally
programmed an attack to take place 4 minutes after I log in on my computer |
6. In the second week some interesting information
was processed. Taken the operation field information from the ipslogs table with value “deny” , we
obtain the external ip addresses that represent the majority of attacks in the
network .
Event6: Suspect external ip addresses to represent attacks to our network.
7. One of the suspected ip addresses is 172.10.0.6 IP which uncommonly have
internal network notation not recorded in the BigMktNetwork.txt. Therefore, one
of the attacks is using internal numeric notation for the site 1, according to
the architecture document of the network. Enterprise Site 1.
Event7: IP address using internal IP
address notation was detected therefore imply attack from the local network
8. Now we have this uncommon IP address
in our scope. All Big brother records are filtered by 171.10.0.6 IP address.
This ip address was chosen because the ip range belongs to the local network, but it was not
listed in the BMN list, making a table visualization (scrolling data), we can
see that many parameters into the bbcontent field are
repeated in multiple records. We thought that it was due to the repeated
messages status on multiple hostnames. However, checking the Universe total
records that was generated this uncommon local IP address (a total of 3.876.422
identical records were detected).
Event8: The message in the fig 7 indicates in the "bbcontent_extract" field was reported by 900 different
servers and involves 290.133 Big brother records causing the bulk of the
traffic.
9. After reading more about security,
we obtained information about a very useful technique by the external hackers
named “Port Knocking”, in which the hacker attempts to do a ping instruction to
several ports until the firewall give up. We detect many ip
addresses receiving messages with no identified port by clustering technic.
Event9: Unknown ports
identified using a “port knocking” technic.
10. The web server WEB03.BIGMKT3.COM
172.30.0.4 (with external IP address 10.0.4.4), reports the entire 4th April,
no connection status until April 5th at 08:00am.
Event10: Specific web
server in site 3 is reported with down status.
11.
In
the network description that reports the next paragraph: Organizationally, Big
Marketing consists of three different branches, each with around 400 employees
and its own web servers. Therefore the BigMktnetwork.txt 408 IP address that
involves site 1 are reported, 407 IP address that involves site 3 are reported
and suspiciously 308 that involves the site 2. Thus.
Event11: BigBrother network health monitoring program reports 100
workstations status which was not declared in the BigMktnetwork.txt list. The
range of these hostnames involves interval between wss2-101 to wss2-200.
12. With reference to 11th event, there
are 11 ip address in which numerical notation begin
with 172 prefix, this prefix is supposed be part of the internal big marketing
network but checking the network architecture document provided, this suspect ip addresses range are not included in the architecture, of
these 11 ip address is reported the ip address issue of the event 8th.
Event12: External attackers
want to clone internal ip addresses.
MC3.2 – Speculate on one or more
narratives that describe the events on the network. Provide a list of analytic
hypotheses and/or unanswered questions about the notable events. In other
words, if you were to hand off your timeline to an analyst who will conduct
further investigation, what confirmations and/or answers would you like to see
in their report back to you? Your answer should be no more than 300 words long
and may contain up to three additional images.
1. On our whole experience,
we assume that the principal threat under our network is represented by the
internal 172.10.0.6 IP address. By this
theory we assume that this attack occurs inside the Big Brother installations,
running some process as scheduled tasks during off business hours, we would
like more information about this IP address to keep tracing on it.
2. If we had known about
scheduled reboots, we would have found out that the lack of information between
the days 7 and 10 April therefore we can get a biased universe to investigate
about the server reboots.
3. We would like get contact with the network maintenance
responsible person, is very strange which no action were taken under multiple
automatic alerts about usage disk percent usage. Due to this administrator
server get the 100% used and 2 more servers’ reports over the 90% use percent
disk.
4. We can speculate, in
order to explain our experiments, that the agent system log that reports each 5
minutes status of the servers has been infected by some external attack due to
were found fuzzy repeated information messages in their content. System logs
are one of the main sources of detecting intrusions, therefore we need to
secure that status log must Works correctly.
5. It can be very useful
obtain a complete report of the scheduled activity in other tan office hours
(like big data y business intelligent OLAP processes) to discard like suspect
activity into the network.
6. Although network description inform us three different branches, each with
around 400 employees and its own web servers, for the site 2 only 308
workstations are listed in the BigBrother.txt file, on the other hand, in the BBrother´s records 100 hostnames with site 2 range but are
not listed in the file.txt. Assuming the all the employees have their own workstation;
it is possible that incomplete architecture network list is incomplete.
MC3.3 – Describe the role that
your visual analytics played in enabling discovery of the notable events in
MC3.1. Describe whether your visual analytics play a role in formulating the
questions in MC3.2. Your answer should be no more than 300 words long and may
contain up to three additional images.
Efficient information visualization is an important element
required for urgent detection of intruders. The conventional way of browsing
system logs does
not provide immediate action against unauthorized server entries.
We propose a portfolio made by Tableau software and R for administrators to
easily identify and quickly act upon intrusions in the Big Brother network.
Information can be viewed easily and detection of intrusion will allow
administrators immediate engagement to secure a 100% availability of the
network based on the visualization of the resources which is under attack. We
want to make the proposal to build a web application with all our metrics
integration to provide easy detection of the unusual activity of the servers.
We made the effort of
exploring the possibility of refining visualization techniques that are able to
represent detailed information of each server and their behaviors for effective
visualization. In a first time R software was chosen to plot our analysis, the
plots were discard by us and we decide to use tableau software. Some visualization using
concentric circles so that server disk use can be seen in our work. The
concept and the system we have presented shows that the importance of
integrating Information Visualization in Intrusion Detection can been
indispensable for detecting possible intrusions within a network.
Heatmaps in the modern visualization topics offers a simple
yet powerful way of displaying the distribution of time series big data. Heatmaps use colors to represent the density of points,
making it easier to pick out areas of high activity.
Dashboard detecting not
identified ports in clustering processing for all the complete servers. Orange area
indicates the not identified ports.
For both MC3.1 and MC3.2 we consider the field of
information visualization is important due to having a interesting scope as this case, is motivating
in the discover models process. On the other hand visualizing in MC3.2 play a important role to get more
questions drawing time series plot cross variable information. It is possible the questions and doubs which typically are included
as traditional evaluation criteria after our visualization be increased because
hide patterns are shown.